AITopics | error landscape

Collaborating Authors

error landscape

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Phase Transitions between Accuracy Regimes in L2 regularized Deep Neural Networks

Ersoy, Ibrahim Talha, Wiesner, Karoline

arXiv.org Artificial IntelligenceAug-29-2025

Increasing the L2 regularization of Deep Neural Networks (DNNs) causes a first-order phase transition into the under-parametrized phase -- the so-called onset-of learning. We explain this transition via the scalar (Ricci) curvature of the error landscape. We predict new transition points as the data complexity is increased and, in accordance with the theory of phase transitions, the existence of hysteresis effects. We confirm both predictions numerically. Our results provide a natural explanation of the recently discovered phenomenon of '\emph{grokking}' as DNN models getting stuck in a local minimum of the error surface, corresponding to a lower accuracy phase. Our work paves the way for new probing methods of the intrinsic structure of DNNs in and beyond the L2 context.

artificial intelligence, machine learning, transition, (17 more...)

arXiv.org Artificial Intelligence

2505.06597

Country: Europe > Germany (0.14)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Quantifying Behavioural Distance Between Mathematical Expressions

Mežnar, Sebastian, Džeroski, Sašo, Todorovski, Ljupčo

arXiv.org Artificial IntelligenceAug-21-2024

Existing symbolic regression methods organize the space of candidate mathematical expressions primarily based on their syntactic, structural similarity. However, this approach overlooks crucial equivalences between expressions that arise from mathematical symmetries, such as commutativity, associativity, and distribution laws for arithmetic operations. Consequently, expressions with similar errors on a given data set are apart from each other in the search space. This leads to a rough error landscape in the search space that efficient local, gradient-based methods cannot explore. This paper proposes and implements a measure of a behavioral distance, BED, that clusters together expressions with similar errors. The experimental results show that the stochastic method for calculating BED achieves consistency with a modest number of sampled values for evaluating the expressions. This leads to computational efficiency comparable to the tree-based syntactic distance. Our findings also reveal that BED significantly improves the smoothness of the error landscape in the search space for symbolic regression.

behavioural distance, expression, smoothness, (12 more...)

arXiv.org Artificial Intelligence

2408.11515

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Understanding the Convergence in Balanced Resonate-and-Fire Neurons

Higuchi, Saya, Bohte, Sander M., Otte, Sebastian

arXiv.org Artificial IntelligenceJun-1-2024

Resonate-and-Fire (RF) neurons are an interesting complementary model for integrator neurons in spiking neural networks (SNNs). Due to their resonating membrane dynamics they can extract frequency patterns within the time domain. While established RF variants suffer from intrinsic shortcomings, the recently proposed balanced resonate-and-fire (BRF) neuron marked a significant methodological advance in terms of task performance, spiking and parameter efficiency, as well as, general stability and robustness, demonstrated for recurrent SNNs in various sequence learning tasks. One of the most intriguing result, however, was an immense improvement in training convergence speed and smoothness, overcoming the typical convergence dilemma in backprop-based SNN training. This paper aims at providing further intuitions about how and why these convergence advantages emerge. We show that BRF neurons, in contrast to well-established ALIF neurons, span a very clean and smooth - almost convex - error landscape. Furthermore, empirical results reveal that the convergence benefits are predominantly coupled with a divergence boundary-aware optimization, a major component of the BRF formulation that addresses the numerical stability of the time-discrete resonator approximation. These results are supported by a formal investigation of the membrane dynamics indicating that the gradient is transferred back through time without loss of magnitude.

convergence, divergence boundary, neuron, (14 more...)

arXiv.org Artificial Intelligence

2406.00389

Country:

Europe > Germany (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Model soups to increase inference without increasing compute time

Dansereau, Charles, Sobral, Milo, Bhogal, Maninder, Zalai, Mehdi

arXiv.org Artificial IntelligenceJan-24-2023

Leo Breiman published Bagging predictors, where he presented a method that generates multiple versions of a predictor In this paper, we compare Model Soups performances and uses them to get an aggregated predictor. Then, on three different models (ResNet, ViT and in 1999, Eric Bauer and Ron Kohavi published An Empirical EfficientNet) using three Soup Recipes (Greedy Comparison of Voting Classification Algorithms: Bagging, Soup Sorted, Greedy Soup Random and Uniform Boosting, and Variants, where they reviewed many voting soup) from [1], and reproduce the results of the classification algorithms, like Bagging and AdaBoost, and authors. We then introduce a new Soup Recipe showed interesting results. In 2000, Thomas G. Dietterich called Pruned Soup. Results from the soups were published Ensemble Methods in Machine Learning, where better than the best individual model for the pretrained he explains why ensembles can outperform single classifiers.

accuracy, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2301.10092

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask?

Paul, Mansheej, Chen, Feng, Larsen, Brett W., Frankle, Jonathan, Ganguli, Surya, Dziugaite, Gintare Karolina

arXiv.org Artificial IntelligenceOct-6-2022

Modern deep learning involves training costly, highly overparameterized networks, thus motivating the search for sparser networks that can still be trained to the same accuracy as the full network (i.e. matching). Iterative magnitude pruning (IMP) is a state of the art algorithm that can find such highly sparse matching subnetworks, known as winning tickets. IMP operates by iterative cycles of training, masking smallest magnitude weights, rewinding back to an early training point, and repeating. Despite its simplicity, the underlying principles for when and how IMP finds winning tickets remain elusive. In particular, what useful information does an IMP mask found at the end of training convey to a rewound network near the beginning of training? How does SGD allow the network to extract this information? And why is iterative pruning needed? We develop answers in terms of the geometry of the error landscape. First, we find that$\unicode{x2014}$at higher sparsities$\unicode{x2014}$pairs of pruned networks at successive pruning iterations are connected by a linear path with zero error barrier if and only if they are matching. This indicates that masks found at the end of training convey the identity of an axial subspace that intersects a desired linearly connected mode of a matching sublevel set. Second, we show SGD can exploit this information due to a strong form of robustness: it can return to this mode despite strong perturbations early in training. Third, we show how the flatness of the error landscape at the end of training determines a limit on the fraction of weights that can be pruned at each iteration of IMP. Finally, we show that the role of retraining in IMP is to find a network with new small weights to prune. Overall, these results make progress toward demystifying the existence of winning tickets by revealing the fundamental role of error landscape geometry.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2210.03044

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Contests & Prizes (1.00)
Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

A Constructive Prediction of the Generalization Error Across Scales

Rosenfeld, Jonathan S., Rosenfeld, Amir, Belinkov, Yonatan, Shavit, Nir

arXiv.org Machine LearningSep-27-2019

The dependency of the generalization error of neural networks on model and dataset size is of critical importance both in practice and for understanding the theory of neural networks. Nevertheless, the functional form of this dependency remains elusive. In this work, we present a functional form which approximates well the generalization error in practice. Capitalizing on the successful concept of model scaling (e.g., width, depth), we are able to simultaneously construct such a form and specify the exact models which can attain it across model/data scales. Our construction follows insights obtained from observations conducted over a range of model/data scales, in various model types and datasets, in vision and language tasks. We show that the form both fits the observations well across scales, and provides accurate predictions from small- to large-scale models and data.

deep learning, error landscape, upstream oil & gas, (17 more...)

arXiv.org Machine Learning

1909.12673

Country:

Asia (0.28)
Europe > Italy (0.14)
Oceania > Australia (0.14)
(2 more...)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Visualising Basins of Attraction for the Cross-Entropy and the Squared Error Neural Network Loss Functions

Bosman, Anna Sergeevna, Engelbrecht, Andries, Helbig, Mardé

arXiv.org Machine LearningJan-9-2019

Quantification of the stationary points and the associated basins of attraction of neural network loss surfaces is an important step towards a better understanding of neural network loss surfaces at large. This work proposes a novel method to visualise basins of attraction together with the associated stationary points via gradient-based random sampling. The proposed technique is used to perform an empirical study of the loss surfaces generated by two different error metrics: quadratic loss and entropic loss. The empirical observations confirm the theoretical hypothesis regarding the nature of neural network attraction basins. Entropic loss is shown to exhibit stronger gradients and fewer stationary points than quadratic loss, indicating that entropic loss has a more searchable landscape. Quadratic loss is shown to be more resilient to overfitting than entropic loss. Both losses are shown to exhibit local minima, but the number of local minima is shown to decrease with an increase in dimensionality. Thus, the proposed visualisation technique successfully captures the local minima properties exhibited by the neural network loss surfaces, and can be used for the purpose of fitness landscape analysis of neural networks.

attractor, gradient walk, minima, (15 more...)

arXiv.org Machine Learning

1901.02302

Country:

Africa > South Africa > Gauteng > Pretoria (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
Oceania > Australia > Queensland (0.04)
(7 more...)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

r/MachineLearning - [D] Visualizing and analyzing error landscapes

#artificialintelligenceNov-4-2018, 17:46:46 GMT

It's difficult to visualize and understand the high dimensional error landscapes (ie cost functions) of neural nets and other machine learning algorithms. A common method is to project the parameter space onto two dimensions and plot a surface. What are some effective choices for this projection that help visualize salient features of the error? Are there nonlinear approaches that are better? More importantly, what is known about the geometry of these cost functions for neural networks trained on real data?

artificial intelligence, error landscape, machine learning, (2 more...)

#artificialintelligence

Industry: Media > News (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback